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A method for calculating quantile function and its further use 

for data htting 
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Abstract 


This paper introduces a polynomial transformation model based on Weibull distribu¬ 
tion, whereby the analytical representation of the quantile function for many probability 
distributions can be obtained. Firstly, the target random variable x with specified dis¬ 
tribution is expressed as a polynomial of a Weibull random variable z, the coefficients 
are conveniently determined by the percentile matching method. Then, substituting z 
with its quantile function z = \[—ln{l — gives the analytical expression of the 

quantile function of x. Furthermore, using the probability weighted moments matching 
method, this polynomial transformation model can be used for data fitting. Through 
numerical experiment, it makes evident that the proposed model is capable of handling 
some distributions close to binomial which are difficult for the extant approaches, and 
the quantile functions of various distributions are accurately approximated within the 
probit range [10“'^, 1 — 10“^]. 
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1. Introduction 

The polynomial transformation method is an easy-to-use procedure for simulating 
various continuous distributions W certain basic probability distribution, such as the 
standard normal distribution [l|, IH, and the standard logistic distribution Q. This 
technique has been used in various practical settings (see Q paragraph 2). 

The polynomial transformation method is expressed as: 


n 



( 1 ) 


where x is the target random variable, at {i = 0,... ,n) are undetermined coefficients, z 
is a standard normal or standard logistic random variable. 

The advantage that the logistic variable has over the normal variable is that its in¬ 
verse cumulative distribution function (ICDF) or quantile function has a closed form 
z = ln[u/{l — u)]. Substituting it into Eq.(IT|) gives the quantile function of x, which is 


Email address: xaoshaoying@shu.edu.cn (Qing Xiao) 


Preprint submitted to arXiv 


August 26, 2015 





difficult to be obtained analytically for most probability distributions. Using the asso¬ 
ciated quantile function, the percentile of x is evaluated directly for a given percentage 
point, and random numbers with specified distribution can be generated from zero-one 
standard uniform deviates. 

As for the determination of the coefficients a^, the raw moments, central moments 
and L-moments are employed to perform the moment matching 0 , 0 . Due to the math¬ 
ematical difficulty, polynomial models of degree 3 and 5 are developed, which are limited 
to a portion of distributions [H, 0]. Yet, this method shows the potential utility for the 
transformation between different types of probability distributions. 

In this paper, the polynomial transformation model is extended to a much higher 
order (20th) in terms of the Weibull distribution, which also has an analytical repre¬ 
sentation of ICDF. When establishing the quantile function of the target distribtuion, 
the percentile matching method is implemented, the coefficients of the model are conve¬ 
niently assessed by an interpolation method. If the model is used to represent data, the 
probability weighted moments (PWM) matching method is employed, which requires 
solving a system of linear equations to determine the coefficients. Finally, numerical 
examples are provided to check the proposed method. 

2. Calculating the quantile function 

Let x be a continuous random variable, let z he a Weibull random variable with CDF 
W{z). The distribution W{z) considered is the Weibull distribution with parameters 
A = 1 and fc = 4. The quantile function of W{\, 4) is: 

z = [—ln{\ — (2) 

where u is a uniform variable, taking values over the interval [0,1]. 

Substitute Eq. (Ill) into Eq.dl]), the quantile function of x is obtained: 

X = F~^{u) ~ Co -I- ai[—^n(l — -I- • • • -I- a„[—Zn(l — (3) 

The parameters in Eq.® can be conveniently evaluated by the percentile matching (PM) 
method. 

The basic idea of PM method is very simple. Eor a given percentage u, Eq. m should 
be satisfied. Select k values of percentage: ui,U 2 , ■ ■ ■ ,Uk, evaluate Xk = F~^{uk) and 
Zk = [—ln{l — Using pair values of {uk,Xk), the polynomial model is obtained 

by the least square method. 

It should be noted that the value of [—^n(l — tends to be positive infinity or 0 

with u close to 0 or 1. In this case, the quantile obtained by the proposed model is of 
high error. 

In this paper, a 20th order polynomial is employed, and 21 percentage points Uk are 
chosen evenly over the interval [10“^, 1 — 10“'*]. Through numerical example, it is found 
that accurate results can be obtained for u G [10“"*, 1 — 10“^]. 

To assess the proposed polynomial model, comparison is carried out between per¬ 
centile from the original distribution and those obtained by polynomial model. 10* 
percentages Uk are chosen evenly over the interval [10“*, 1 — 10“*]. For each percentage 
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Mfe, the absolute relative error is calculated: 


x[%\ Xk=F ^{Uk) xl=^a^[-ln{l-Uk)] (4) 

i^O 

Xk is given by the program of Matlab. 

Testing for various probability distributions, the results are shown in Table 1. 


Table 1: The relative error of quantiles for different distributions 


Distribution 

Parameters 

Maximum value of £k 

Lognormal 

0</i< 100 0<CT<l 

0.07% 

Normal 

^ = 0 (T = 1 

0.16% 

Gamma 

1 < a < 100 1 < 6 < 100 

0.05% 

Beta 

1.5 < a < 20 1.5 < 6 < 20 

0.085% 

Rayleigh 

0 < ^ < 200 

0.0066% 

Chisquare 

2 < ^ < 100 

0.075% 


Sk = 


Xj. - Xk 


Xk 


Weibull distribution with other parameters is also tried, it is found the Weibull dis¬ 
tribution W{X,k) (A = 1, 3 < fc < 5) outperforms others, and IT(1,4) stands out for its 
generality and accuracy. 

For the T-distribution {2 <v < 100), Weibull distribution VF(1, 6) is more preferable, 
141 percentage points are are chosen evenly over the interval [10““^, 1 — 10“'’’]. Using pair 
values of {uk,Xk), the polynomial of degree 20 is established by an least square method. 
The maximum value of £k is 0.67%. 

The probability distributions with closed form ICDF have also been attempted for z, 
such as the uniform distribution, exponential distribution and the logistic distribution. 
Only the logistic distribution yields a good simulation, which is located in the probit 
range [10“^, 1 — 10“^]. 

3. Data fitting 

If the PM based polynomial model is used to fit distributions to data, large sample 
size is required to guarantee a precise value of the percentile Xk = F~^{uk)- Therefore, 
for a small to moderate sample size, the moment matching method is more preferable. 

The statistical information of a random variable is characterized by its statistical 
moments, a good simulation of the target distribution can be achieved by equating the 
moments of the polynomial model in Eq. CD with those of the data. In general, the stan- 

dardized central moments are involved, the calculation of rth raw moments I ^ 

Vi=0 

is indispensable, which leads to a complicated computation when the degree of the poly¬ 
nomial is higher than 4 (see Q appendix A). Although the L-moments are employed to 
simplify the problem, and the analytical expressions of the coefficients are obtained, but 
the tedious mathematical derivation limits its further use for the polynomial model of a 
higher degree Q. 

Note that the L-moments are dehned as linear combinations of the probability weighted 
moments (PWM) [^, mathematically, it is equivalent to perform the moment matching 
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using PWM directly, and the coefficients can be obtained with a much more simple 
procedure. 

The PWM of the random variable x is defined as : 

= B{xP- [F(a:)]’' • [1 - F(x)]^} (5) 


For computational convenience, a particular type of PWM, /3r = is considered: 


= B {xP ■ lF(x)r} 


p+OO 


X ■ B^(x) ■ f{x)dx 


( 6 ) 


where f{x) is the PDF. 

The Pr of the polynomial model is: 


/ + 00 / " \ " 

aiz" ■ W^{z) ■ w{z)dz = Q 

\i=0 J i=0 ’ ’ 


^+130 


^r,i,0 — 


z^ ■W'^{z)-w{z)dz= / [W-\u)y -v^du 


( 7 ) 


Since W{z) and w{z) are known functions, q can be integrated numerically in terms 
of the Weibull distribution W(l,4). 

The PWM of the observed data is calculated as follows, sort the sample into ascending 
order xi < X 2 < ■ • ■ < Xm, the unbiased estimate of Pr is M- 


y (a-l)(s-2)...(s-r) 
m ^ (m — l)(m — 2) ■ • • (to — r) “ 


( 8 ) 


For a polynomial of degree n, calculate the first (n + 1) PWMs of the data (r = 
0,..., n), substitute these PWMs into Eq. the following system of linear equations is 
established: 


7-^0,0,0 




/ao\ 


//3o\ 

K^,o,o • 


■ ^f,n,0 


Qi 

= 

Pr 

^-^n,0,0 






\Pn) 


( 9 ) 


Solving the system of linear equation above gives the coefficients of the polynomial model. 
Then, the PDF is: 


fix) 


1 

[F-^ix)]' 


E n Oj 

i—1 4 


1 


( 10 ) 


It should be noted that the coefficient matrix in Eq.@ becomes nearly singular as n 
increases. Biased values of Ui would be obtained for a polynomial model of a too high 
degree. 
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The generalized lambda distribution (GLD) [l^ and the Johnson system [ll| are two 
families of distributions commonly used for data fitting, which are both four-parameter 
distributions. These two distributions allow for the control of the first four moments of 
the data, and are often limited to unimodal distributions. While, the proposed method 
is capable of controlling much higher moments, and can accommodate some distributions 
that are difficult for the GLD and Johnson system. 


4. Example 

In this section, numerical examples are performed in Matlab to check the proposed 
method. 

As for calculating the quantile function, six examples are performed associated with 
Lognormal distribution lnN{0, 1), standard normal distribution N{0, 1), Gamma distri¬ 
bution r(10, 1), Beta distribution Beta{1.5, 1.5), Rayleigh distribution i?(l) and Ghisquare 
distribution x(3). All these distributions are simulated by the PM based polynomial 
model of degree 20. The PDFs are depicted in Figure 1. The values of Sk are presented 
in Table 2. Inspection of Figure 1 and Table 2 indicates the accuracy of the PM method. 


Table 2: The absolute relative error 



Average(%) 

Minimum(%) 

Maximum(%) 

lnN{Q,l) 

1.8 X 10"® 

0 

0.015 

iV(0,l) 

5.1 X 10-4 

5.1 X 10-44 

0.16 

r(io,i) 

1.4 X 10-4 

0 

0.074 

Heta(1.5,1.5) 

1.1 X 10-® 

0 

0.0085 

R(0.5) 

2.1 X 10-® 

0 

0.0015 

X(3) 

2.6 X 10-® 

0 

0.030 


Here, an example associated with the data fitting is performed. Consider the random 
variable x: 

X = Xc + axd ( 11 ) 

where Xc is a continuous random variable with standard normal distribution A^(0,1), Xd 
is a discrete random variable with binomial distribution i?(l,0.5), a is a real constant. 

Setting a = 2.4, 10® samples are generated, a 20th order polynomial model is em¬ 
ployed, the coefficients are determined by PWM matching method. 

The GLD and Johnson system are also employed to represent the data. The pa¬ 
rameters of the GLD are estimated by the L-moment matching method ll^, and the 
Johnson system is determined by the standard moment matching method [8|. The PDFs 
are depicted in Figure 2. 

The PDF of x is close to binomial, which is difficult to be simulated by the GLD and 
Johnson system. While, the proposed model gives a good representation of the samples. 
In the upper tail part of PDF, the value of u is close to 1, f{x) in Ea. dTOl) tends to be 
infinite, leading to the cusp in the right part of the PDF in Figure 2(c). 
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(a) Lognormal distribution /nA^(0,1) 


(b) Normal distribution ^"(0,1) 




(c) Gamma distribution r(10,1) 


(d) Beta distribution Beta(1.5, 1.5) 




(e) Rayleigh distribution R(l) 


(f) Chisquare distribution x(3) 


Fig. 1: The PDFs of probability distributions 
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(a) Johnson system fit 



(b) GLD fit 



(c) Polynomial model fit 


Fig. 2: The PDFs the samples 


5. Conclusion 

A Weibull distribution based polynomial transformation model is proposed in this pa¬ 
per. The quantile functions of various distributions are obtained by percentile matching 
method, which shows high accuracy in the probit range [10“'*, 1 — 10“^]. When the poly¬ 
nomial model is employed to fit distributions to data, the probability weighted moments 
matching method is utilized to determine the coefficients. Through numerical examples, 
it is demonstrated that the proposed model gives a superior simulation as compared to 
the GLD and Johnson system. 
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